Skip to content

Spark dayname function implementation#20825

Open
kazantsev-maksim wants to merge 22 commits intoapache:mainfrom
kazantsev-maksim:spark_dayname
Open

Spark dayname function implementation#20825
kazantsev-maksim wants to merge 22 commits intoapache:mainfrom
kazantsev-maksim:spark_dayname

Conversation

@kazantsev-maksim
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

N/A

Rationale for this change

Add new spark function: https://spark.apache.org/docs/latest/api/sql/index.html#dayname

What changes are included in this PR?

  • Implementation
  • SLT tests

Are these changes tested?

Yes, tests added as part of this PR.

Are there any user-facing changes?

No, these are new function.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Mar 9, 2026

query T
SELECT dayname('2008-02-20'::DATE);
----
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also verify that the data type matches the spark data types (Itf8 / LargeUtf8 / Utf8View)

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

fn spark_day_name(days: i32) -> String {
let weekday = Date32Type::to_naive_date_opt(days).unwrap().weekday();
let display_name = get_display_name(weekday.num_days_from_monday());
display_name.unwrap()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential panic in unwrap ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to improve it

}

fn spark_day_name(days: i32) -> String {
let weekday = Date32Type::to_naive_date_opt(days).unwrap().weekday();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well ? Could we guarantee that unwrap is never going to panic ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to improve it

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments

@coderfender
Copy link
Copy Markdown
Contributor

Test failures :

caused by
Arrow error: Cast error: Cannot cast string '' to value of Date32 type
[SQL] SELECT dayname(''::STRING);
at /__w/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/datetime/dayname.slt:68

@kazantsev-maksim
Copy link
Copy Markdown
Contributor Author

Thanks for the review @coderfender. Could you please another see, when you have a time?

Copy link
Copy Markdown
Contributor

@coderfender coderfender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes are looking good @kazantsev-maksim . Left a couple of questions for further review . Thank you very much for your patience :)


query T
SELECT dayname('2010-04-24'::TIMESTAMP);
----
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also tests to add actual timestamps instead of dates parsed as timestamps here ?

DataType::Date32 | DataType::Timestamp(_, _) => spark_day_name_inner(array),
DataType::Utf8 | DataType::Utf8View | DataType::LargeUtf8 => {
let date_array =
cast_with_options(array, &DataType::Date32, &CastOptions::default())?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen in case of an invalid string / error here ? Could probably handle it here and throw an error if it is a malformed input ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid strings will be replaced with null values.

6 => Some(String::from("Sun")),
_ => None,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we are handling timezone's support here ? Spark handles timezone in current implementation and this could produce wrong results

TypeSignatureClass::Timestamp,
)]),
TypeSignature::Coercible(vec![Coercion::new_exact(
TypeSignatureClass::Native(logical_date()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark's date java.util.date is Date32 on Rust side of things . Given that we are not handling Date64 in the match arm , do you still think we should have logical_date() as one of the signature supporting Date32 / Date64 ? Perhaps we could change to Date32 ?

query T
SELECT dayname('2010-04-24'::TIMESTAMP);
----
Sat
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need tests to cover various timezones (atleast one apart from UTC) :)

----
NULL

query T
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this in Spark 4.1.1 and found that the empty string behavior depends on ANSI mode:

-- ANSI on (Spark 4 default):
spark-sql> SELECT dayname('');
[CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "DATE" ...

-- ANSI off:
spark-sql> SET spark.sql.ansi.enabled=false;
spark-sql> SELECT dayname('');
NULL

The current implementation returns NULL, which matches the non-ANSI behavior. Since Spark 4 defaults to ANSI mode, it might be worth supporting both behaviors here — maybe through an enable_ansi_mode flag similar to how mod/pmod handle it in the same crate. That way the caller (e.g. Comet or another Spark-compatible engine) can choose whether invalid string-to-date casts should error or return NULL, matching whichever ANSI mode the user has configured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants